Toward Scene Recognition by Discovering Semantic Structures and Parts

نویسنده

  • Guang-Tong Zhou
چکیده

Scene recognition is a fundamental and open problem in computer vision. It is an essential component of a variety of real-world applications, including image search, robotics, social media analysis, and many others. The key to success in scene recognition is to well understand the rich semantics embedded in scenes. For example, it is intuitive to label airport for a scene of sky, airplane, road, and building. In this thesis, we identify two directions for exploiting scene semantics. On one hand, we advocate for the discovery of scene parts that correspond to various semantic components in scenes, like objects and surfaces. On the other hand, we promote the discovery of scene structures that capture the spatial relations among scene parts, like sky-above-airplane. By leveraging scene parts and structures in scene recognition, we are able to build strong recognition systems. Our contributions are two-fold. First, we propose two clustering algorithms for the data-driven discovery of semantics in visual data. In detail, we develop latent maximum-margin clustering to model semantics as latent variables, and hierarchical maximum-margin clustering to discover treestructured semantic hierarchies. Our second contribution is the development of two scene recognition methods that leverage scene structure discovery and part discovery. The first method recognizes scene by considering a scene image as a structured collage of objects. The second method discovers scene parts that are both discriminative and representative for scene recognition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward parts-based scene understanding with pixel-support parts-sparse pictorial structures

Scene understanding remains a significant challenge in the computer vision community. The visual psychophysics literature has demonstrated the importance of interdependence among parts of the scene. Yet, the majority of methods in computer vision remain local. Pictorial structures have arisen as a fundamental parts-based model for some vision problems, such as articulated object detection. Howe...

متن کامل

Semantic Information and Local Constraints for Parametric Parts in Interactive Virtual Construction

This paper introduces a semantic representation for virtual prototyping in interactive virtual construction applications. The representation reflects semantic information about dynamic constraints to define objects’ modification and construction behavior as well as knowledge structures supporting multimodal interaction utilizing speech and gesture. It is conveniently defined using XML-based mar...

متن کامل

The Role of Scene Gist and Spatial Dependency among Objects in the Semantic Guidance of Attention

A previous study (Hwang et al., 2011) found evidence for semantic guidance of visual attention during the inspection of real-world scenes, i.e., an influence of semantic relationships among scene objects on overt shifts of attention. In particular, the results revealed an observer bias toward gaze transitions between semantically similar objects. However, these results are not necessarily indic...

متن کامل

Learning Hybrid Part Filters for Scene Recognition

This paper introduces a new image representation for scene recognition, where an image is described based on the response maps of object part filters. The part filters are learned from existing datasets with object location annotations, using deformable part-based models trained by latent SVM [1]. Since different objects may contain similar parts, we describe a method that uses a semantic hiera...

متن کامل

Semantic categorization precedes affective evaluation of visual scenes.

We compared the primacy of affective versus semantic categorization by using forced-choice saccadic and manual response tasks. Participants viewed paired emotional and neutral scenes involving humans or animals flashed rapidly in extrafoveal vision. Participants were instructed to categorize the targets by saccading toward the location occupied by a predefined target scene. The affective task i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015